Designing a New Bloom Filter-based Index for Distributed Data Management ⋆

نویسندگان

  • Tiejian LUO
  • Zhu WANG
  • Fuxing CHENG
  • Xin ZHANG
  • Xiang WANG
چکیده

Distributed architectures are widely used in Internet applications nowadays. In such systems, one of the key techniques is how to maintain an indexing data structure which records elements of each single node in the system. Bloom filter is one of the popular solutions. The beautiful mathematical format offers a fast and space-efficient solution for probabilistic membership presentation. In many Internet applications, user access for items follows Zipf’s law where a small number of items attract many visits. According to that phenomenon, we propose a selective insertion method of bloom filter to reduce the workload of BFs by finding an optimal load ratio. The experiments show that our new approach can reduce the false lookup time by 36% compared with the pure bloom filter approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems

The indexing technique in distributed object storage system is the crucial part of a large scale application, where the index data structure may be published in many nodes. Here arises a problem on preserving the privacy of the ownership information while supporting queries on item locations with limited index space. Probabilistic data structure, such as the bloom filter which records the locat...

متن کامل

A Cuckoo Filter Modification Inspired by Bloom Filter

Probabilistic data structures are so popular in membership queries, network applications, and so on. Bloom Filter and Cuckoo Filter are two popular space efficient models that incorporate in set membership checking part of many important protocols. They are compact representation of data that use hash functions to randomize a set of items. Being able to store more elements while keeping a reaso...

متن کامل

Bsi: Bloom Filter-based Semantic Indexing for Unstructured P2p Networks

Resource management and search is very important yet challenging in large-scale distributed systems like P2Pnetworks. Most existing P2P systems rely on indexing to efficiently route queries over the network. However, searches based on such indices face two key issues. First, majority of existing search schemes often rely on simply keyword based indices that can only support exact string based m...

متن کامل

LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data

We present LocationSpark, a spatial data processing system built on top of Apache Spark, a widely used distributed data processing system. LocationSpark offers a rich set of spatial query operators, e.g., range search, kNN, spatio-textual operation, spatial-join, and kNN-join. To achieve high performance, LocationSpark employs various spatial indexes for in-memory data, and guarantees that immu...

متن کامل

COCA Filters: Co-occurrence Aware Bloom Filters

We propose an indexing data structure based on a novel variation of Bloom filters. Signature files have been proposed in the past as a method to index large text databases though they suffer from a high false positive error problem. In this paper we introduce COCA Filters, a new type of Bloom filters which exploits the co-occurrence probability of words in documents to reduce the false positive...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014